Coordinate Descent on the Orthogonal Group for Recurrent Neural Network Training

نویسندگان

چکیده

We address the poor scalability of learning algorithms for orthogonal recurrent neural networks via use stochastic coordinate descent on group, leading to a cost per iteration that increases linearly with number states. This contrasts cubic dependency typical feasible such as Riemannian gradient descent, which prohibits big network architectures. Coordinate rotates successively two columns matrix. When (i.e., indices rotated columns) is selected uniformly at random each iteration, we prove convergence algorithm under standard assumptions loss function, stepsize and minibatch noise. In addition, numerically show has an approximately sparse structure. Leveraging this observation, propose variant our proposed relies Gauss-Southwell selection rule. Experiments benchmark training problem approach very promising step towards

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Recurrent neural network training with preconditioned stochastic gradient descent

Recurrent neural networks (RNN), especially the ones requiring extremely long term memories, are difficult to training. Hence, they provide an ideal testbed for benchmarking the performance of optimization algorithms. This paper reports test results of a recently proposed preconditioned stochastic gradient descent (PSGD) algorithm on RNN training. We find that PSGD may outperform Hessian-free o...

متن کامل

on descent spectral cg algorithm for training recurrent neural networks

In this paper, we evaluate the performance of a new class of conjugate gradient methods for training recurrent neural networks which ensure the sufficient descent property. The presented methods preserve the advantages of classical conjugate gradient methods and simultaneously avoid the usually inefficient restarts. Simulation results are also presented using three different recurrent neural ne...

متن کامل

Accelerating Recurrent Neural Network Training

An efficient algorithm for recurrent neural network training is presented. The approach increases the training speed for tasks where a length of the input sequence may vary significantly. The proposed approach is based on the optimal batch bucketing by input sequence length and data parallelization on multiple graphical processing units. The baseline training performance without sequence bucket...

متن کامل

Efficient coordinate-descent for orthogonal matrices through Givens rotations

Optimizing over the set of orthogonal matrices is a central component in problems like sparse-PCA or tensor decomposition. Unfortunately, such optimization is hard since simple operations on orthogonal matrices easily break orthogonality, and correcting orthogonality usually costs a large amount of computation. Here we propose a framework for optimizing orthogonal matrices, that is the parallel...

متن کامل

Coordinate-descent for learning orthogonal matrices through Givens rotations

Optimizing over the set of orthogonal matrices is a central component in problems like sparsePCA or tensor decomposition. Unfortunately, such optimization is hard since simple operations on orthogonal matrices easily break orthogonality, and correcting orthogonality usually costs a large amount of computation. Here we propose a framework for optimizing orthogonal matrices, that is the parallel ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i7.20742